Overview

Dataset statistics

Number of variables18
Number of observations47926
Missing cells0
Missing cells (%)0.0%
Duplicate rows0
Duplicate rows (%)0.0%
Total size in memory13.1 MiB
Average record size in memory287.5 B

Variable types

Numeric13
Categorical4
DateTime1

Alerts

name has a high cardinality: 46963 distinct values High cardinality
host_name has a high cardinality: 11425 distinct values High cardinality
df_index is highly correlated with id and 1 other fieldsHigh correlation
id is highly correlated with df_index and 1 other fieldsHigh correlation
host_id is highly correlated with df_index and 1 other fieldsHigh correlation
neighbourhood_group is highly correlated with latitudeHigh correlation
latitude is highly correlated with neighbourhood_groupHigh correlation
room_type is highly correlated with priceHigh correlation
price is highly correlated with room_typeHigh correlation
number_of_reviews is highly correlated with reviews_per_monthHigh correlation
reviews_per_month is highly correlated with number_of_reviewsHigh correlation
df_index is highly correlated with id and 1 other fieldsHigh correlation
id is highly correlated with df_index and 1 other fieldsHigh correlation
host_id is highly correlated with df_index and 1 other fieldsHigh correlation
number_of_reviews is highly correlated with reviews_per_monthHigh correlation
reviews_per_month is highly correlated with number_of_reviewsHigh correlation
df_index is highly correlated with idHigh correlation
id is highly correlated with df_indexHigh correlation
room_type is highly correlated with priceHigh correlation
price is highly correlated with room_typeHigh correlation
number_of_reviews is highly correlated with reviews_per_monthHigh correlation
reviews_per_month is highly correlated with number_of_reviewsHigh correlation
df_index is highly correlated with id and 1 other fieldsHigh correlation
id is highly correlated with df_index and 1 other fieldsHigh correlation
host_id is highly correlated with df_index and 1 other fieldsHigh correlation
neighbourhood_group is highly correlated with neighbourhood and 2 other fieldsHigh correlation
neighbourhood is highly correlated with neighbourhood_group and 2 other fieldsHigh correlation
latitude is highly correlated with neighbourhood_group and 2 other fieldsHigh correlation
longitude is highly correlated with neighbourhood_group and 2 other fieldsHigh correlation
name is uniformly distributed Uniform
df_index has unique values Unique
id has unique values Unique
number_of_reviews has 9510 (19.8%) zeros Zeros
reviews_per_month has 9510 (19.8%) zeros Zeros
availability_365 has 17464 (36.4%) zeros Zeros
month has 2086 (4.4%) zeros Zeros

Reproduction

Analysis started2021-10-05 09:28:16.606406
Analysis finished2021-10-05 09:29:01.507867
Duration44.9 seconds
Software versionpandas-profiling v3.1.0
Download configurationconfig.json

Variables

df_index
Real number (ℝ≥0)

HIGH CORRELATION
HIGH CORRELATION
HIGH CORRELATION
HIGH CORRELATION
UNIQUE

Distinct47926
Distinct (%)100.0%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean24220.06001
Minimum0
Maximum48857
Zeros1
Zeros (%)< 0.1%
Negative0
Negative (%)0.0%
Memory size374.5 KiB

Quantile statistics

Minimum0
5-th percentile2406.25
Q112051.25
median24149.5
Q336208.75
95-th percentile46428.75
Maximum48857
Range48857
Interquartile range (IQR)24157.5

Descriptive statistics

Standard deviation14061.80844
Coefficient of variation (CV)0.5805852021
Kurtosis-1.185390867
Mean24220.06001
Median Absolute Deviation (MAD)12079
Skewness0.01948179953
Sum1160770596
Variance197734456.5
MonotonicityNot monotonic
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
20471
 
< 0.1%
402491
 
< 0.1%
197791
 
< 0.1%
177301
 
< 0.1%
238731
 
< 0.1%
218241
 
< 0.1%
423021
 
< 0.1%
484451
 
< 0.1%
463961
 
< 0.1%
361551
 
< 0.1%
Other values (47916)47916
> 99.9%
ValueCountFrequency (%)
01
< 0.1%
11
< 0.1%
21
< 0.1%
31
< 0.1%
41
< 0.1%
51
< 0.1%
61
< 0.1%
71
< 0.1%
81
< 0.1%
91
< 0.1%
ValueCountFrequency (%)
488571
< 0.1%
488561
< 0.1%
488551
< 0.1%
488541
< 0.1%
488531
< 0.1%
488521
< 0.1%
488511
< 0.1%
488501
< 0.1%
488491
< 0.1%
488481
< 0.1%

id
Real number (ℝ≥0)

HIGH CORRELATION
HIGH CORRELATION
HIGH CORRELATION
HIGH CORRELATION
UNIQUE

Distinct47926
Distinct (%)100.0%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean18861822.12
Minimum2539
Maximum36487245
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size374.5 KiB

Quantile statistics

Minimum2539
5-th percentile1185399
Q19386073.25
median19475094
Q328834379.75
95-th percentile35263722.75
Maximum36487245
Range36484706
Interquartile range (IQR)19448306.5

Descriptive statistics

Standard deviation10951890.59
Coefficient of variation (CV)0.5806379956
Kurtosis-1.218210411
Mean18861822.12
Median Absolute Deviation (MAD)9777786.5
Skewness-0.07293591539
Sum9.039716868 × 1011
Variance1.199439075 × 1014
MonotonicityNot monotonic
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
116674551
 
< 0.1%
235601641
 
< 0.1%
281307411
 
< 0.1%
78512191
 
< 0.1%
349446551
 
< 0.1%
52292121
 
< 0.1%
222322691
 
< 0.1%
55812731
 
< 0.1%
200552421
 
< 0.1%
118018001
 
< 0.1%
Other values (47916)47916
> 99.9%
ValueCountFrequency (%)
25391
< 0.1%
25951
< 0.1%
36471
< 0.1%
38311
< 0.1%
50221
< 0.1%
50991
< 0.1%
51211
< 0.1%
51781
< 0.1%
52031
< 0.1%
52381
< 0.1%
ValueCountFrequency (%)
364872451
< 0.1%
364856091
< 0.1%
364854311
< 0.1%
364850571
< 0.1%
364846651
< 0.1%
364843631
< 0.1%
364840871
< 0.1%
364831521
< 0.1%
364830101
< 0.1%
364828091
< 0.1%

name
Categorical

HIGH CARDINALITY
UNIFORM

Distinct46963
Distinct (%)98.0%
Missing0
Missing (%)0.0%
Memory size4.4 MiB
Hillside Hotel
 
18
Home away from home
 
17
New york Multi-unit building
 
16
Brooklyn Apartment
 
12
Loft Suite @ The Box House Hotel
 
11
Other values (46958)
47852 

Length

Max length179
Median length36
Mean length36.69745024
Min length1

Characters and Unicode

Total characters182
Distinct characters1
Distinct categories1 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique46326 ?
Unique (%)96.7%

Sample

1st rowCouch in Harlem Harvey Refugees only
2nd rowBeautiful room in Bushwick
3rd rowQuiet, Cozy UES Studio Near the Subway
4th rowGigantic Sunny Room in Park Slope-Private Backyard
5th rowJen Apt

Common Values

ValueCountFrequency (%)
Hillside Hotel18
 
< 0.1%
Home away from home17
 
< 0.1%
New york Multi-unit building16
 
< 0.1%
Brooklyn Apartment12
 
< 0.1%
Loft Suite @ The Box House Hotel11
 
< 0.1%
Private Room11
 
< 0.1%
Artsy Private BR in Fort Greene Cumberland10
 
< 0.1%
Private room10
 
< 0.1%
Cozy Brooklyn Apartment8
 
< 0.1%
Private room in Brooklyn8
 
< 0.1%
Other values (46953)47805
99.7%

Length

Histogram of lengths of the category
ValueCountFrequency (%)
in16604
 
5.7%
room9944
 
3.4%
bedroom7531
 
2.6%
7279
 
2.5%
private7132
 
2.5%
apartment6648
 
2.3%
cozy4946
 
1.7%
apt4533
 
1.6%
brooklyn4037
 
1.4%
studio3814
 
1.3%
Other values (12451)217980
75.0%

Most occurring characters

ValueCountFrequency (%)
182
100.0%

Most occurring categories

ValueCountFrequency (%)
Control182
100.0%

Most frequent character per category

Control
ValueCountFrequency (%)
182
100.0%

Most occurring scripts

ValueCountFrequency (%)
Common182
100.0%

Most frequent character per script

Common
ValueCountFrequency (%)
182
100.0%

Most occurring blocks

ValueCountFrequency (%)
ASCII182
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
182
100.0%

host_id
Real number (ℝ≥0)

HIGH CORRELATION
HIGH CORRELATION
HIGH CORRELATION

Distinct37325
Distinct (%)77.9%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean66366719.84
Minimum2438
Maximum274321313
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size374.5 KiB

Quantile statistics

Minimum2438
5-th percentile797420
Q17598992.25
median29863220.5
Q3104497453
95-th percentile242478115.2
Maximum274321313
Range274318875
Interquartile range (IQR)96898460.75

Descriptive statistics

Standard deviation78126572.85
Coefficient of variation (CV)1.177195031
Kurtosis0.2838830002
Mean66366719.84
Median Absolute Deviation (MAD)26616062
Skewness1.246626958
Sum3.180691415 × 1012
Variance6.103761385 × 1015
MonotonicityNot monotonic
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
1224305196
 
0.2%
1609895896
 
0.2%
6139196391
 
0.2%
2254157387
 
0.2%
20038061065
 
0.1%
147501552
 
0.1%
750364352
 
0.1%
12076245250
 
0.1%
20503154549
 
0.1%
285674849
 
0.1%
Other values (37315)47239
98.6%
ValueCountFrequency (%)
24381
 
< 0.1%
25711
 
< 0.1%
27876
< 0.1%
28452
 
< 0.1%
28681
 
< 0.1%
28812
 
< 0.1%
31511
 
< 0.1%
32111
 
< 0.1%
34151
 
< 0.1%
35631
 
< 0.1%
ValueCountFrequency (%)
2743213131
< 0.1%
2743114611
< 0.1%
2743076001
< 0.1%
2742984531
< 0.1%
2742732841
< 0.1%
2742256171
< 0.1%
2741954581
< 0.1%
2741883861
< 0.1%
2741033831
< 0.1%
2740799641
< 0.1%

host_name
Categorical

HIGH CARDINALITY

Distinct11425
Distinct (%)23.8%
Missing0
Missing (%)0.0%
Memory size2.9 MiB
Michael
 
417
David
 
402
John
 
293
Alex
 
278
Daniel
 
226
Other values (11420)
46310 

Length

Max length35
Median length6
Mean length6.073592622
Min length1

Characters and Unicode

Total characters0
Distinct characters0
Distinct categories0 ?
Distinct scripts0 ?
Distinct blocks0 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique6883 ?
Unique (%)14.4%

Sample

1st rowMorgan
2nd rowJulio
3rd rowAmy
4th rowRachel
5th rowJennifer

Common Values

ValueCountFrequency (%)
Michael417
 
0.9%
David402
 
0.8%
John293
 
0.6%
Alex278
 
0.6%
Daniel226
 
0.5%
Sarah226
 
0.5%
Maria204
 
0.4%
Jessica202
 
0.4%
Mike193
 
0.4%
Andrew190
 
0.4%
Other values (11415)45295
94.5%

Length

Histogram of lengths of the category
ValueCountFrequency (%)
1120
 
2.1%
and622
 
1.2%
michael460
 
0.9%
david448
 
0.8%
john336
 
0.6%
alex329
 
0.6%
laura292
 
0.5%
maria244
 
0.5%
daniel242
 
0.5%
sarah239
 
0.4%
Other values (10237)48970
91.9%

Most occurring characters

ValueCountFrequency (%)
No values found.

Most occurring categories

ValueCountFrequency (%)
No values found.

Most frequent character per category

Most occurring scripts

ValueCountFrequency (%)
No values found.

Most frequent character per script

Most occurring blocks

ValueCountFrequency (%)
No values found.

Most frequent character per block

neighbourhood_group
Categorical

HIGH CORRELATION
HIGH CORRELATION

Distinct5
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size2.7 MiB
2
20852 
1
20041 
3
5574 
0
 
1087
4
 
372

Length

Max length1
Median length1
Mean length1
Min length1

Characters and Unicode

Total characters0
Distinct characters0
Distinct categories0 ?
Distinct scripts0 ?
Distinct blocks0 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row2
2nd row1
3rd row2
4th row1
5th row2

Common Values

ValueCountFrequency (%)
220852
43.5%
120041
41.8%
35574
 
11.6%
01087
 
2.3%
4372
 
0.8%

Length

Histogram of lengths of the category

Pie chart

ValueCountFrequency (%)
220852
43.5%
120041
41.8%
35574
 
11.6%
01087
 
2.3%
4372
 
0.8%

Most occurring characters

ValueCountFrequency (%)
No values found.

Most occurring categories

ValueCountFrequency (%)
No values found.

Most frequent character per category

Most occurring scripts

ValueCountFrequency (%)
No values found.

Most frequent character per script

Most occurring blocks

ValueCountFrequency (%)
No values found.

Most frequent character per block

neighbourhood
Real number (ℝ≥0)

HIGH CORRELATION

Distinct221
Distinct (%)0.5%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean106.9792806
Minimum0
Maximum220
Zeros42
Zeros (%)0.1%
Negative0
Negative (%)0.0%
Memory size374.5 KiB

Quantile statistics

Minimum0
5-th percentile13
Q151
median94
Q3178
95-th percentile214
Maximum220
Range220
Interquartile range (IQR)127

Descriptive statistics

Standard deviation68.8965507
Coefficient of variation (CV)0.6440177046
Kurtosis-1.264789579
Mean106.9792806
Median Absolute Deviation (MAD)60
Skewness0.2562315228
Sum5127089
Variance4746.734698
MonotonicityNot monotonic
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
2143910
 
8.2%
133704
 
7.7%
942635
 
5.5%
282451
 
5.1%
2021934
 
4.0%
951883
 
3.9%
641832
 
3.8%
2011766
 
3.7%
511560
 
3.3%
1271489
 
3.1%
Other values (211)24762
51.7%
ValueCountFrequency (%)
042
 
0.1%
14
 
< 0.1%
221
 
< 0.1%
377
 
0.2%
4889
1.9%
517
 
< 0.1%
665
 
0.1%
7140
 
0.3%
86
 
< 0.1%
92
 
< 0.1%
ValueCountFrequency (%)
220200
 
0.4%
2191
 
< 0.1%
21811
 
< 0.1%
21788
 
0.2%
216157
 
0.3%
2151
 
< 0.1%
2143910
8.2%
21340
 
0.1%
21211
 
< 0.1%
2112
 
< 0.1%

latitude
Real number (ℝ≥0)

HIGH CORRELATION
HIGH CORRELATION

Distinct18984
Distinct (%)39.6%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean40.72879618
Minimum40.49979
Maximum40.91306
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size374.5 KiB

Quantile statistics

Minimum40.49979
5-th percentile40.645715
Q140.68961
median40.72278
Q340.7633075
95-th percentile40.826025
Maximum40.91306
Range0.41327
Interquartile range (IQR)0.0736975

Descriptive statistics

Standard deviation0.05487824725
Coefficient of variation (CV)0.001347406562
Kurtosis0.122347916
Mean40.72879618
Median Absolute Deviation (MAD)0.03655
Skewness0.2423492754
Sum1951968.286
Variance0.003011622021
MonotonicityNot monotonic
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
40.7181318
 
< 0.1%
40.6941413
 
< 0.1%
40.6844413
 
< 0.1%
40.6863413
 
< 0.1%
40.7135312
 
< 0.1%
40.6853712
 
< 0.1%
40.7117112
 
< 0.1%
40.7618912
 
< 0.1%
40.7192311
 
< 0.1%
40.719111
 
< 0.1%
Other values (18974)47799
99.7%
ValueCountFrequency (%)
40.499791
< 0.1%
40.506411
< 0.1%
40.507081
< 0.1%
40.508681
< 0.1%
40.508731
< 0.1%
40.509431
< 0.1%
40.511331
< 0.1%
40.522111
< 0.1%
40.522931
< 0.1%
40.5271
< 0.1%
ValueCountFrequency (%)
40.913061
< 0.1%
40.912341
< 0.1%
40.911691
< 0.1%
40.911671
< 0.1%
40.908041
< 0.1%
40.907341
< 0.1%
40.905271
< 0.1%
40.904841
< 0.1%
40.904061
< 0.1%
40.903911
< 0.1%

longitude
Real number (ℝ)

HIGH CORRELATION

Distinct14578
Distinct (%)30.4%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean-73.95160063
Minimum-74.24442
Maximum-73.71299
Zeros0
Zeros (%)0.0%
Negative47926
Negative (%)100.0%
Memory size374.5 KiB

Quantile statistics

Minimum-74.24442
5-th percentile-74.0028575
Q1-73.98242
median-73.95526
Q3-73.93583
95-th percentile-73.8645225
Maximum-73.71299
Range0.53143
Interquartile range (IQR)0.04659

Descriptive statistics

Standard deviation0.04616488066
Coefficient of variation (CV)-0.0006242580319
Kurtosis5.066036523
Mean-73.95160063
Median Absolute Deviation (MAD)0.02466
Skewness1.281648255
Sum-3544204.412
Variance0.002131196206
MonotonicityNot monotonic
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
-73.9542718
 
< 0.1%
-73.9567718
 
< 0.1%
-73.9540517
 
< 0.1%
-73.9513616
 
< 0.1%
-73.950616
 
< 0.1%
-73.9533216
 
< 0.1%
-73.9479116
 
< 0.1%
-73.9572515
 
< 0.1%
-73.9453715
 
< 0.1%
-73.9843915
 
< 0.1%
Other values (14568)47764
99.7%
ValueCountFrequency (%)
-74.244421
< 0.1%
-74.242851
< 0.1%
-74.240841
< 0.1%
-74.239861
< 0.1%
-74.239141
< 0.1%
-74.238031
< 0.1%
-74.230591
< 0.1%
-74.212381
< 0.1%
-74.210171
< 0.1%
-74.209411
< 0.1%
ValueCountFrequency (%)
-73.712991
< 0.1%
-73.71691
< 0.1%
-73.717951
< 0.1%
-73.718291
< 0.1%
-73.719281
< 0.1%
-73.721731
< 0.1%
-73.721791
< 0.1%
-73.722471
< 0.1%
-73.724351
< 0.1%
-73.725811
< 0.1%

room_type
Categorical

HIGH CORRELATION
HIGH CORRELATION

Distinct3
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size2.7 MiB
0
24614 
1
22156 
2
 
1156

Length

Max length1
Median length1
Mean length1
Min length1

Characters and Unicode

Total characters0
Distinct characters0
Distinct categories0 ?
Distinct scripts0 ?
Distinct blocks0 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row2
2nd row1
3rd row0
4th row0
5th row1

Common Values

ValueCountFrequency (%)
024614
51.4%
122156
46.2%
21156
 
2.4%

Length

Histogram of lengths of the category

Pie chart

ValueCountFrequency (%)
024614
51.4%
122156
46.2%
21156
 
2.4%

Most occurring characters

ValueCountFrequency (%)
No values found.

Most occurring categories

ValueCountFrequency (%)
No values found.

Most frequent character per category

Most occurring scripts

ValueCountFrequency (%)
No values found.

Most frequent character per script

Most occurring blocks

ValueCountFrequency (%)
No values found.

Most frequent character per block

price
Real number (ℝ≥0)

HIGH CORRELATION
HIGH CORRELATION

Distinct587
Distinct (%)1.2%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean142.7652422
Minimum10
Maximum1800
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size374.5 KiB

Quantile statistics

Minimum10
5-th percentile40
Q169
median105
Q3175
95-th percentile350
Maximum1800
Range1790
Interquartile range (IQR)106

Descriptive statistics

Standard deviation131.4182141
Coefficient of variation (CV)0.9205196729
Kurtosis29.28742523
Mean142.7652422
Median Absolute Deviation (MAD)45
Skewness4.224543833
Sum6842167
Variance17270.747
MonotonicityIncreasing
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
1502043
 
4.3%
1002037
 
4.3%
501526
 
3.2%
601455
 
3.0%
2001394
 
2.9%
751368
 
2.9%
801272
 
2.7%
651190
 
2.5%
701166
 
2.4%
1201127
 
2.4%
Other values (577)33348
69.6%
ValueCountFrequency (%)
1017
< 0.1%
113
 
< 0.1%
124
 
< 0.1%
131
 
< 0.1%
156
 
< 0.1%
166
 
< 0.1%
182
 
< 0.1%
194
 
< 0.1%
2033
0.1%
216
 
< 0.1%
ValueCountFrequency (%)
18002
< 0.1%
17991
 
< 0.1%
17951
 
< 0.1%
17631
 
< 0.1%
17503
< 0.1%
17491
 
< 0.1%
17311
 
< 0.1%
17004
< 0.1%
16801
 
< 0.1%
16004
< 0.1%

minimum_nights
Real number (ℝ≥0)

Distinct100
Distinct (%)0.2%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean6.5220757
Minimum1
Maximum365
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size374.5 KiB

Quantile statistics

Minimum1
5-th percentile1
Q11
median2
Q35
95-th percentile30
Maximum365
Range364
Interquartile range (IQR)4

Descriptive statistics

Standard deviation16.10130951
Coefficient of variation (CV)2.468740053
Kurtosis219.5989898
Mean6.5220757
Median Absolute Deviation (MAD)1
Skewness12.07727139
Sum312577
Variance259.2521678
MonotonicityNot monotonic
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
112640
26.4%
211490
24.0%
37992
16.7%
303294
 
6.9%
43293
 
6.9%
53029
 
6.3%
72050
 
4.3%
6749
 
1.6%
14561
 
1.2%
10482
 
1.0%
Other values (90)2346
 
4.9%
ValueCountFrequency (%)
112640
26.4%
211490
24.0%
37992
16.7%
43293
 
6.9%
53029
 
6.3%
6749
 
1.6%
72050
 
4.3%
8130
 
0.3%
980
 
0.2%
10482
 
1.0%
ValueCountFrequency (%)
36527
0.1%
3641
 
< 0.1%
3605
 
< 0.1%
3541
 
< 0.1%
3006
 
< 0.1%
2991
 
< 0.1%
2751
 
< 0.1%
2702
 
< 0.1%
2651
 
< 0.1%
2501
 
< 0.1%

number_of_reviews
Real number (ℝ≥0)

HIGH CORRELATION
HIGH CORRELATION
HIGH CORRELATION
ZEROS

Distinct386
Distinct (%)0.8%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean23.57901765
Minimum0
Maximum488
Zeros9510
Zeros (%)19.8%
Negative0
Negative (%)0.0%
Memory size374.5 KiB

Quantile statistics

Minimum0
5-th percentile0
Q11
median5
Q324
95-th percentile115
Maximum488
Range488
Interquartile range (IQR)23

Descriptive statistics

Standard deviation44.30524514
Coefficient of variation (CV)1.879011492
Kurtosis16.28536397
Mean23.57901765
Median Absolute Deviation (MAD)5
Skewness3.481772224
Sum1130048
Variance1962.954747
MonotonicityNot monotonic
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
09510
19.8%
15122
 
10.7%
23408
 
7.1%
32487
 
5.2%
41970
 
4.1%
51592
 
3.3%
61333
 
2.8%
71160
 
2.4%
81103
 
2.3%
9943
 
2.0%
Other values (376)19298
40.3%
ValueCountFrequency (%)
09510
19.8%
15122
10.7%
23408
 
7.1%
32487
 
5.2%
41970
 
4.1%
51592
 
3.3%
61333
 
2.8%
71160
 
2.4%
81103
 
2.3%
9943
 
2.0%
ValueCountFrequency (%)
4881
< 0.1%
4801
< 0.1%
4741
< 0.1%
4671
< 0.1%
4661
< 0.1%
4591
< 0.1%
4581
< 0.1%
4541
< 0.1%
4511
< 0.1%
4481
< 0.1%
Distinct1763
Distinct (%)3.7%
Missing0
Missing (%)0.0%
Memory size374.5 KiB
Minimum1976-01-01 01:00:00+00:00
Maximum2019-07-08 00:00:00+00:00
Histogram with fixed size bins (bins=50)

reviews_per_month
Real number (ℝ≥0)

HIGH CORRELATION
HIGH CORRELATION
HIGH CORRELATION
ZEROS

Distinct935
Distinct (%)2.0%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean1.100549806
Minimum0
Maximum58.5
Zeros9510
Zeros (%)19.8%
Negative0
Negative (%)0.0%
Memory size374.5 KiB

Quantile statistics

Minimum0
5-th percentile0
Q10.05
median0.38
Q31.61
95-th percentile4.33
Maximum58.5
Range58.5
Interquartile range (IQR)1.56

Descriptive statistics

Standard deviation1.599773144
Coefficient of variation (CV)1.453612672
Kurtosis43.67746979
Mean1.100549806
Median Absolute Deviation (MAD)0.38
Skewness3.28091007
Sum52744.95
Variance2.559274112
MonotonicityNot monotonic
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
09510
 
19.8%
0.02911
 
1.9%
0.05883
 
1.8%
1876
 
1.8%
0.03797
 
1.7%
0.16661
 
1.4%
0.04646
 
1.3%
0.08593
 
1.2%
0.09586
 
1.2%
0.06572
 
1.2%
Other values (925)31891
66.5%
ValueCountFrequency (%)
09510
19.8%
0.0142
 
0.1%
0.02911
 
1.9%
0.03797
 
1.7%
0.04646
 
1.3%
0.05883
 
1.8%
0.06572
 
1.2%
0.07458
 
1.0%
0.08593
 
1.2%
0.09586
 
1.2%
ValueCountFrequency (%)
58.51
< 0.1%
27.951
< 0.1%
20.941
< 0.1%
19.751
< 0.1%
17.821
< 0.1%
16.811
< 0.1%
16.031
< 0.1%
15.781
< 0.1%
15.321
< 0.1%
15.231
< 0.1%

calculated_host_listings_count
Real number (ℝ≥0)

Distinct43
Distinct (%)0.1%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean3.397091349
Minimum1
Maximum96
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size374.5 KiB

Quantile statistics

Minimum1
5-th percentile1
Q11
median1
Q32
95-th percentile9
Maximum96
Range95
Interquartile range (IQR)1

Descriptive statistics

Standard deviation9.918211126
Coefficient of variation (CV)2.919618611
Kurtosis54.33235713
Mean3.397091349
Median Absolute Deviation (MAD)0
Skewness6.988563317
Sum162809
Variance98.37091194
MonotonicityNot monotonic
Histogram with fixed size bins (bins=43)
ValueCountFrequency (%)
132182
67.1%
26641
 
13.9%
32843
 
5.9%
41436
 
3.0%
5840
 
1.8%
6558
 
1.2%
8416
 
0.9%
7399
 
0.8%
9234
 
0.5%
10210
 
0.4%
Other values (33)2167
 
4.5%
ValueCountFrequency (%)
132182
67.1%
26641
 
13.9%
32843
 
5.9%
41436
 
3.0%
5840
 
1.8%
6558
 
1.2%
7399
 
0.8%
8416
 
0.9%
9234
 
0.5%
10210
 
0.4%
ValueCountFrequency (%)
96192
0.4%
9191
0.2%
8787
0.2%
6565
 
0.1%
52104
0.2%
5050
 
0.1%
4998
0.2%
4747
 
0.1%
4343
 
0.1%
3939
 
0.1%

availability_365
Real number (ℝ≥0)

ZEROS

Distinct366
Distinct (%)0.8%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean109.8798564
Minimum0
Maximum365
Zeros17464
Zeros (%)36.4%
Negative0
Negative (%)0.0%
Memory size374.5 KiB

Quantile statistics

Minimum0
5-th percentile0
Q10
median41
Q3217
95-th percentile359
Maximum365
Range365
Interquartile range (IQR)217

Descriptive statistics

Standard deviation130.3937594
Coefficient of variation (CV)1.186693937
Kurtosis-0.9174765325
Mean109.8798564
Median Absolute Deviation (MAD)41
Skewness0.8067430482
Sum5266102
Variance17002.5325
MonotonicityNot monotonic
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
017464
36.4%
3651233
 
2.6%
364468
 
1.0%
1401
 
0.8%
89358
 
0.7%
5340
 
0.7%
3306
 
0.6%
179297
 
0.6%
90285
 
0.6%
2269
 
0.6%
Other values (356)26505
55.3%
ValueCountFrequency (%)
017464
36.4%
1401
 
0.8%
2269
 
0.6%
3306
 
0.6%
4233
 
0.5%
5340
 
0.7%
6244
 
0.5%
7218
 
0.5%
8233
 
0.5%
9193
 
0.4%
ValueCountFrequency (%)
3651233
2.6%
364468
 
1.0%
363228
 
0.5%
362160
 
0.3%
361107
 
0.2%
360100
 
0.2%
359131
 
0.3%
358175
 
0.4%
35789
 
0.2%
35675
 
0.2%

month
Real number (ℝ≥0)

ZEROS

Distinct12
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean5.316404457
Minimum0
Maximum11
Zeros2086
Zeros (%)4.4%
Negative0
Negative (%)0.0%
Memory size374.5 KiB

Quantile statistics

Minimum0
5-th percentile1
Q14
median5
Q36
95-th percentile10
Maximum11
Range11
Interquartile range (IQR)2

Descriptive statistics

Standard deviation2.403657632
Coefficient of variation (CV)0.4521209121
Kurtosis0.3200809153
Mean5.316404457
Median Absolute Deviation (MAD)1
Skewness0.07051643411
Sum254794
Variance5.777570012
MonotonicityNot monotonic
Histogram with fixed size bins (bins=12)
ValueCountFrequency (%)
613390
27.9%
412243
25.5%
55920
12.4%
84635
 
9.7%
02086
 
4.4%
21760
 
3.7%
11658
 
3.5%
101534
 
3.2%
111528
 
3.2%
71268
 
2.6%
Other values (2)1904
 
4.0%
ValueCountFrequency (%)
02086
 
4.4%
11658
 
3.5%
21760
 
3.7%
3756
 
1.6%
412243
25.5%
55920
12.4%
613390
27.9%
71268
 
2.6%
84635
 
9.7%
91148
 
2.4%
ValueCountFrequency (%)
111528
 
3.2%
101534
 
3.2%
91148
 
2.4%
84635
 
9.7%
71268
 
2.6%
613390
27.9%
55920
12.4%
412243
25.5%
3756
 
1.6%
21760
 
3.7%

Interactions

Correlations

Spearman's ρ

The Spearman's rank correlation coefficient (ρ) is a measure of monotonic correlation between two variables, and is therefore better in catching nonlinear monotonic correlations than Pearson's r. It's value lies between -1 and +1, -1 indicating total negative monotonic correlation, 0 indicating no monotonic correlation and 1 indicating total positive monotonic correlation.

To calculate ρ for two variables X and Y, one divides the covariance of the rank variables of X and Y by the product of their standard deviations.

Pearson's r

The Pearson's correlation coefficient (r) is a measure of linear correlation between two variables. It's value lies between -1 and +1, -1 indicating total negative linear correlation, 0 indicating no linear correlation and 1 indicating total positive linear correlation. Furthermore, r is invariant under separate changes in location and scale of the two variables, implying that for a linear function the angle to the x-axis does not affect r.

To calculate r for two variables X and Y, one divides the covariance of X and Y by the product of their standard deviations.

Kendall's τ

Similarly to Spearman's rank correlation coefficient, the Kendall rank correlation coefficient (τ) measures ordinal association between two variables. It's value lies between -1 and +1, -1 indicating total negative correlation, 0 indicating no correlation and 1 indicating total positive correlation.

To calculate τ for two variables X and Y, one determines the number of concordant and discordant pairs of observations. τ is given by the number of concordant pairs minus the discordant pairs divided by the total number of pairs.

Cramér's V (φc)

Cramér's V is an association measure for nominal random variables. The coefficient ranges from 0 to 1, with 0 indicating independence and 1 indicating perfect association. The empirical estimators used for Cramér's V have been proved to be biased, even for large samples. We use a bias-corrected measure that has been proposed by Bergsma in 2013 that can be found here.

Phik (φk)

Phik (φk) is a new and practical correlation coefficient that works consistently between categorical, ordinal and interval variables, captures non-linear dependency and reverts to the Pearson correlation coefficient in case of a bivariate normal input distribution. There is extensive documentation available here.

Missing values

A simple visualization of nullity by column.
Nullity matrix is a data-dense display which lets you quickly visually pick out patterns in data completion.

Sample

First rows

df_indexidnamehost_idhost_nameneighbourhood_groupneighbourhoodlatitudelongituderoom_typepriceminimum_nightsnumber_of_reviewslast_reviewreviews_per_monthcalculated_host_listings_countavailability_365month
02167117437106Couch in Harlem Harvey Refugees only33511962Morgan29440.81302-73.95349210101976-01-01 01:00:00+00:000.00104
14718135642891Beautiful room in Bushwick268138154Julio12840.69640-73.91898110122019-06-18 00:00:00+00:002.00106
22322718835820Quiet, Cozy UES Studio Near the Subway52777892Amy220140.76844-73.953410103102018-10-22 00:00:00+00:000.391010
33277425839759Gigantic Sunny Room in Park Slope-Private Backyard167570251Rachel119040.66242-73.994640101142018-10-28 00:00:00+00:001.061410
42225817979764Jen Apt84497333Jennifer217840.72237-73.99817110522017-04-15 00:00:00+00:000.07100
53318926235873Voted #1 Airbnb In NYC197169969Maria310540.68939-73.798860102222019-07-06 00:00:00+00:001.7613325
62223217952277Newly renovated, fully furnished room in Brooklyn62685070Katie12840.69974-73.91935110501976-01-01 01:00:00+00:000.00104
73346926496645Room with a view110049861Martin121440.70959-73.95693110101976-01-01 01:00:00+00:000.001834
83103124114389Very Spacious bedroom, steps from CENTRAL PARK.180661875Salim220240.76844-73.98333110122018-04-23 00:00:00+00:000.13100
928571620248Large furnished 2 bedrooms- - 30 days Minimum2196224Sally26440.73051-73.981400103001976-01-01 01:00:00+00:000.0041374

Last rows

df_indexidnamehost_idhost_nameneighbourhood_groupneighbourhoodlatitudelongituderoom_typepriceminimum_nightsnumber_of_reviewslast_reviewreviews_per_monthcalculated_host_listings_countavailability_365month
479161370210314411Ultra-Modern 6-bedroom House (Great for Groups)4393578Jack23440.74234-74.000320173110842019-03-31 00:00:00+00:002.802977
479174849836311055Stunning & Stylish Brooklyn Luxury, near Train245712163Urvashi11340.68245-73.9341701749101976-01-01 01:00:00+00:000.0013034
479184004931120563Magnificent 5 Bedroom Brooklyn Townhouse9786357Ilsa115840.68187-73.9707501750252019-07-01 00:00:00+00:001.6513585
479194610335109246Bedroom in heart of Cobble Hill BK w/ private roof88988332Dorothea14340.68929-73.99656117503001976-01-01 01:00:00+00:000.0011794
479204685835481836Duplex Residence with Breathtaking NYC View!232557023NuAve27540.73942-73.9864801750701976-01-01 01:00:00+00:000.0011784
4792170845131879Large, comfy 1br in Williamsburg!26529661Alexis121440.71270-73.94643017636001976-01-01 01:00:00+00:000.00104
479222496420016493ART LOFT/HOME: DINNERS, GATHERINGS, PHOTO142118455Allan214440.72560-73.99487017951382019-06-21 00:00:00+00:001.6511166
479234352633718046Prime Location!Cozy 2BR in West Village!253192168Romeo220940.73037-74.0049701799201976-01-01 01:00:00+00:000.0012704
479241435711234747Mins away to Manhattan Suite Residence24146326Julien3440.76626-73.9305421800352017-04-09 00:00:00+00:000.132900
479254585434997894entire apartment for rent263670476Algerchaabi1740.63360-74.02884018003001976-01-01 01:00:00+00:000.0013354